Back

Frontiers in Artificial Intelligence

Frontiers Media SA

Preprints posted in the last 7 days, ranked by how well they match Frontiers in Artificial Intelligence's content profile, based on 18 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Technology acceptance of machine learning in life sciences: the role of hype perception and journal impact factor.

Serrano, A. E.

2026-06-09 health informatics 10.64898/2026.06.03.26354262 medRxiv
Top 0.1%
4.0%
Show abstract

Machine learning (ML) has emerged as a transformative technology across biomedical and life science sectors, with applications spanning drug discovery, medical imaging, genomics, and clinical decision support (Goecks et al., 2020; Patel et al., 2020). Despite exponential growth in ML-related publications, from fewer than 100 articles in 2003 to nearly 25,000 by 2021 (NCBI, 2022), adoption among industry professionals remains uneven and sector-dependent. Understanding what drives or inhibits this adoption is critical for organisations seeking to leverage ML capabilities in research and clinical practice. Technology adoption in organisational contexts has been extensively studied through the Technology Acceptance Model (TAM), originally proposed by Davis (1989) and subsequently extended to incorporate external variables influencing perceived usefulness (PU) and perceived ease of use (PEU) (Venkatesh & Davis, 1996). While TAM has been applied across multiple industries, its application within biomedical and life science contexts remains limited, and the industry-specific factors that shape ML acceptance in this sector have not been systematically examined. Two external variables are particularly relevant to life science professionals. First, the bibliometric journal impact factor (JIF) functions as a cognitive signal of scientific credibility, a sector where evidence-based decision-making is culturally embedded, and publication quality serves as a proxy for technological legitimacy (Garfield, 1996). Second, technology hype, operationalised through the Gartner Hype Cycle framework, represents a social influence variable that shapes organisational expectations and investment decisions around emerging technologies (Gartner Inc., 2018). Whether these variables influence ML acceptance among life science professionals, alongside individual knowledge and experience, has not been empirically tested. This study addresses that gap by investigating ML technology acceptance among 213 biomedical and life science professionals across EMEA, LATAM, and North America, using a cross-sectional quantitative survey and PLS-SEM analysis. The TAM model is extended with three external variables, JIF, technology hype, and prior knowledge and experience, to test their influence on PU and PEU in this specific professional context. Additionally, the study examines demographic and regional differences in ML acceptance, with particular attention to variation between academic researchers and healthcare professionals. The findings contribute a validated, sector-specific extension of TAM for life sciences, provide actionable insights for organisations seeking to accelerate ML implementation, and establish a framework for future subsector-specific research.

2
Acceptability and Perceptions of Artificial Intelligence in Organized Breast Cancer Screening: A Study of French Women

Jean, A.; Merceron, A.; Le Saux, A.; Mercier, E.; Benillouche, P.

2026-06-09 radiology and imaging 10.64898/2026.06.07.26354883 medRxiv
Top 0.1%
2.5%
Show abstract

This study aims to assess women's perceptions of artificial intelligence (AI) used in breast cancer screening in France by examining their knowledge of AI and the barriers to their participation in organized screening. The results of a survey conducted in June 2025 among a national sample of 2000 women (aged 40-75) reveal limited participation and persistent concerns among women. Nevertheless, despite a low awareness of specific AI applications, a large majority of the women surveyed are very favorable to the use of AI in breast cancer diagnosis, even considering it a lever to increase screening participation.

3
A hierarchical clinical fusion transformer model for personalized opioid treatment: Development and validation in diabetic surgical patients

Naderalvojoud, B.; Sutjiadi, B. J.; Koul, A.; Curtin, C.; Gevaert, O.; Hernandez-Boussard, T.

2026-06-08 health informatics 10.64898/2026.06.04.26353331 medRxiv
Top 0.4%
1.3%
Show abstract

Background Machine learning (ML) models are increasingly used to predict adverse outcomes after surgery. However, most rely on static patient characteristics (e.g., age, comorbidities) and overlook clinician-controlled treatment decisions that can be actively modified at the point of care. Discharge opioid prescribing is a key modifiable, clinician-controlled decision, yet optimizing prescribing choices across multiple adverse outcomes remains underexplored in predictive modeling. This study addresses that gap by introducing a novel ML framework that explicitly separates fixed patient risk factors from modifiable prescribing options to support personalized, risk-informed opioid prescribing decisions. Methods We developed the Hierarchical Clinical Fusion Transformer (HCF-Transformer), an ML model designed to estimate patient-specific risks across four postoperative outcomes: prolonged opioid use (POU), chronic pain (CP), 30-day readmission, and opioid-associated outcomes (OAO). The model constructs patient risk profiles from fixed, non-modifiable baseline factors, followed by a transformer layer. Clinician-controllable discharge opioid regimens are modeled as alternative intervention candidates and fused with the fixed risk representation through a clinical fusion mechanism, enabling assessment and ranking based on predicted risks. A Total Relative Risk (TRR) metric, calibrated to each outcome prediction threshold, guides the recommendation process. We evaluated the model in diabetic surgical patients, a common high-risk population. Results The study included 157,853 unique diabetic surgical patients, with outcome prevalences ranging from 47.2% (POU) to 1.8% (OAO). The HCF-Transformer achieved the highest AUROCs, 0.798 for POU, 0.712 for 30-day readmission, 0.808 for CP, and 0.922 for OAO, outperforming Random Forest, FT-Transformer, and ResNet-based models. Compared to these baselines, HCF-Transformer generated more stable and discriminative risk estimates and demonstrated significant variation in TRR scores across discharge opioid options (ANOVA p < .01, eta-squared > .01). This enabled consistent identification of lower-risk regimens tailored to patient-specific profiles. Conclusions The HCF-Transformer introduces a novel hierarchical fusion approach to optimize opioid prescribing by integrating static patient risk profiles with modifiable discharge options. Using transformer-based modeling and a quantifiable TRR metric, the model delivers personalized, risk-aware recommendations. This approach enables data-driven opioid prescribing tailored to individual risk and has the potential to improve postoperative outcomes in high-risk populations. Our findings demonstrate that integrating modifiable factors with structured risk profiles through a transformer-based fusion architecture can enhance decision-support systems, paving the way for more actionable and personalized AI in healthcare.

4
Multivariate Machine Learning Analysis of M-ECG-derived Heart Rate Variability in TBI Veterans, With and Without Comorbid PTSD

Izadysadr, A.; Bagherzadeh, H. S.; Rowland, J.; Martindale, S. L.; Stapleton-Kotloski, J. R.; Godwin, D.

2026-06-08 psychiatry and clinical psychology 10.64898/2026.06.05.26354915 medRxiv
Top 0.5%
1.0%
Show abstract

Traumatic brain injury (TBI) and posttraumatic stress disorder (PTSD) frequently co-occur in Veterans, producing overlapping symptoms and shared autonomic dysregulation. Heart rate variability (HRV) offers a noninvasive measure of autonomic function. Univariate HRV analyses often fail to capture complex, multivariate patterns associated with comorbidity. This study applied machine learning to HRV features extracted from MEG-derived electrocardiogram (M-ECG) signals to differentiate Veterans with TBI alone (TBI-alone; n = 42) from those with comorbid PTSD (TBI+PTSD; n = 40). Time-domain, frequency-domain, geometric, and nonlinear HRV metrics were analyzed using nested cross-validated Random Forest and XGBoost classifiers, with Boruta-based feature selection and SHapley Additive exPlanations for model interpretability. Both classifiers achieved above-chance discrimination (Random Forest AUC = 0.663; XGBoost AUC = 0.635). Multivariate models identified distributed autonomic signatures in TBI+PTSD, including altered sympathovagal balance, increased low-frequency proportion, and greater heart rate complexity. In contrast, univariate HRV differences were subtle and did not survive correction for multiple comparisons. These findings demonstrate how using multivariate machine learning HRV analysis could help with detecting comorbidity-specific autonomic patterns, suggesting that HRV-derived signatures may serve as exploratory biomarkers for risk assessment and targeted interventions in Veterans with TBI and PTSD.

5
A Comparison of Manual and Automated Approaches to Developing Computable Algorithms for Identifying Acute Pancreatitis

Bann, M. A.; Carrell, D. S.; Gruber, S.; Heagerty, P. J.; Williamson, B. D.; Nelson, J. C.; Hazlehurst, B.; Felcher, A.; Nyongesa, D. B.; Slaughter, M. T.; Sapp, D. S.; Cronkite, D. J.; Ball, R.; Floyd, J. S.

2026-06-08 health informatics 10.64898/2026.06.05.26354934 medRxiv
Top 0.5%
1.0%
Show abstract

Objective: Clinical phenotyping methods that rely on clinical and informatics expertise can be time-intensive and costly. We tested both manual and highly automated approaches using electronic health record (EHR) data to identify an FDA Sentinel Initiative health outcome of interest, acute pancreatitis. Materials and Methods: We trained and evaluated machine learning algorithms using EHR data with two approaches: a custom approach that included manually curated features and trained on outcomes data validated with medical record review, and a highly automated approach that greatly simplifies and automates feature engineering and relies on low-cost silver-standard outcomes for model training. Results: Custom algorithms using manually curated structured claims data discriminated cases from non-cases with a high degree of accuracy (cv-AUC 0.89 [95%CI 0.84-0.94]); the inclusion of natural language processing (NLP)-derived covariates from clinical notes increased performance slightly (cv-AUC 0.91[95%CI 0.86-0.97]). The automated algorithm trained on the outcome count of diagnosis codes performed less well (AUC 0.80 [95% CI 0.75-0.85]) but improved using maximum lipase value as an outcome (AUC 0.88 [95% CI 0.84-0.92]). At a positive predictive value of 90%, the custom algorithm had a sensitivity of 92%, the automated algorithm trained on diagnosis code count had a sensitivity of 45%, and the automated algorithm trained on maximum lipase value had a sensitivity of 84%. However, a prediction rule derived by clinicians during chart review was nearly as accurate (maximum lipase value [&ge;] 3 times upper limit of normal; AUC 0.86, PPV 85%, sensitivity 92%). Discussion: Machine learning algorithms with manually curated structured data and NLP features trained on validated outcomes data successfully identified validated events. Use of an outcome in the automated model based on specific phenotype knowledge (maximum lipase value) allowed for performance similar to the custom model and with considerably less resources.

6
Positioning Early Phase CNS Trials for Regulatory and Investor Success: Strategic Implications of the Single Phase 3 Approval Paradigm

Schmidt, P.; Preskorn, S.

2026-06-08 neurology 10.64898/2026.06.05.26353604 medRxiv
Top 0.8%
0.7%
Show abstract

In February 2026, the FDA announced that a single pivotal phase 3 (P3) trial would become the new default standard for drug approval - a regulatory direction that had been legally enabled since the FDA Modernization Act of 1997. This announcement has strategic, scientific, and economic implications for drug developers, contract research organizations (CROs), and biotech investors. We argue that the expansion of this framework, originally reserved for various niche submissions, represents a paradigm change, dramatically increasing the value of rigorous early phase (P1 and P2) trial design, requiring sponsors to establish both statistical efficacy signals and mechanistic biological understanding before entering phase 3. Using a CNS indication cost model, we show that single P3 approval can reduce total development expenditure from approximately $447 million over 14 years to $297 million over 12 years - a savings of $150 million and providing two years of additional commercial runway for a modeled CNS drug. Case examples including lecanemab, omaveloxolone, and tofersen illustrate how biomarker-informed early phase strategies can establish the confirmatory evidence necessary for single-trial approval. We provide practical guidance for maximizing the value of P1 and P2 under this evolving framework.

7
A Three-Tier Operational Benchmark for Evaluating Large Language Models on Hospital Medication Safety

Proulx, J.; Daines, B.; Barton, M.; Leonard, M. E.; Garcia, J. A.; Young, B.; Snell, Q.; West, T. W.; Watson, S. R.; AlQaseer, M.; Louiset, M.; Maqsood, M. B.; Voutt-Goos, M. J.; Douma, C.; Kasbekar, N.; Jeffries, J.; Abu-Rahmeh, W.; Frush, K.; Grewal, D. K.; Bahsoun, M.; Leonard, M.; Frankel, A.; Classen, D. C.; Pestotnik, S. L.

2026-06-10 health informatics 10.64898/2026.06.05.26354271 medRxiv
Top 0.9%
0.6%
Show abstract

Objective. To introduce PsiBench, a clinically validated medication-safety benchmark for evaluating large language models (LLMs) against the standards used to certify hospital computerized provider order entry (CPOE) and electronic health record (EHR) systems, and a non-overlapping three-tier evaluation framework separating highest-stakes discrimination, the operational CDS regime, and category-correct alerting. Materials and Methods. PsiBench comprises 492 medication-safety scenarios across 11 safety categories, created by clinical pharmacology experts whose work underpins an annualized testing procedure used by more than 2,000 U.S. hospitals. The three-tier framework partitions the scenarios non-overlappingly: Discrimination (98 scenarios, 50 fatal vs 48 deception, near-balanced 51%/49%); Operational (394 scenarios, 261 serious unsafe plus 133 safe including 41 Excessive Alerts reclassified as operational negatives); and Attribution (311 alert-required scenarios). We evaluated 40 frontier LLMs from 10 providers over 3 runs per scenario at temperature 0.2 (or the provider default where temperature is not configurable), yielding 59,040 evaluations conducted April 21-23, 2026. Results. Headline binary performance on the full benchmark spans a wide range across the 40 models: F1 78.5%-92.3%, accuracy 65.4%-89.8%, sensitivity 81.4%-100.0%, specificity 6.1%-81.8%. Leading models by F1 (o4-mini 92.3%; o3 92.2%) pair high sensitivity with meaningful specificity; three models saturate sensitivity at 100% but fall below 25% specificity, indistinguishable from a naive always-alert classifier. The wide spread on a single headline metric motivates tier-specific analyses, developed in a separate clinical paper. Discussion and Conclusion. PsiBench and the three-tier framework operationalize a rigorous evaluation rubric for LLM medication safety, grounded in two decades of national hospital audit experience. The framework generalizes to any binary medication-safety classifier (rule-based, conventional ML, or LLM-driven), supporting tier-aware model selection and post-deployment surveillance.

8
Quality and Safety profiles of AI-Generated vs Clinician-Generated Handoffs in Hospital Medicine

Shah, K. P.; Airan Javia, S.; Savage, T.; Bressman, E.

2026-06-08 health informatics 10.64898/2026.06.05.26354946 medRxiv
Top 1.0%
0.6%
Show abstract

End-of-rotation handoffs are critical for patient safety but add to documentation burden for hospitalists. Generative artificial intelligence (AI) may help automate handoff creation using electronic health record data, but its impact on quality and safety is unclear. Methods: We developed an AI handoff tool with a large language model using clinical notes as input and conducted a retrospective evaluation comparing AI-generated and clinician-authored handoffs. Handoffs were assessed across domains of quality and safety through a structured review. Results: Quality ratings were similar between AI and human handoffs (3.7 vs. 3.5, p=0.57). AI-generated handoffs were rated higher for organization (4.4 vs. 4.1, p=0.05) and completeness (4.1 vs. 3.6, p=0.01), but lower for conciseness (3.7 vs. 4.1, p=0.03) and accuracy (4.1 vs. 4.4, p=0.03). Error rates were comparable (0.3/handoff in both groups); however, AI-generated handoffs included inaccuracies (9% of AI errors) and hallucinations (1% of AI errors), while clinician-authored handoffs contained only omissions. Conclusion: Human and AI handoffs have differing error profiles and tradeoffs between completeness and conciseness. Prospective evaluation in clinical workflows is underway.

9
Next-Generation Skin Cancer Detection Using Efficient Fuzzy Fusion of Genomic and Imaging Data

Molla, A. R.; Maity, A.; Saha, S.; Bhattacharya, R.; Chakraborty, A.; Biswas, S.; Nath, S.

2026-06-08 health informatics 10.64898/2026.06.05.26355024 medRxiv
Top 1%
0.5%
Show abstract

Skin cancer requires early detection for improved survival rates. Most existing methods rely on deep learning based image classification, which is affected by visual similarity among lesions. Fewer studies use Gene Expression (GE) analysis, which captures molecular characteristics but lacks structural and visual details. To overcome limitations of individual modalities, this paper proposes a multimodal framework integrating dermoscopic images and GE profiles for skin cancer classification. EfficientNet and logistic regression are used for image based analysis and genomic skin lesion profiling, respectively, followed by fuzzy rule based decision systems to reduce uncertainty within individual modalities. Finally, fuzzy fusion combines predictions from both modalities using uncertainty based weighting of classifier outputs. The experimental findings show that both the image based and GE based classification models individually achieved accuracies of nearly 92%. However, the integration of prediction results through the proposed fuzzy fusion strategy further enhanced the classification performance, achieving an overall accuracy of 94.25%. The results obtained outperform contemporary methods, highlighting the effectiveness of combining complementary multimodal information compared with single modality approaches.

10
An Explainable Multimodal AI Framework with Reinforcement Learning for Post-Surgical Clinical Decision Support

Ahmed, M.; Ahmed, F.; Mow, S. M.; Taha, P. A.; Barua, S.; Rahman, M. M.; Rafy, A.; Mondol, S. M.; Faisal, M. I.

2026-06-10 health informatics 10.64898/2026.06.08.26355217 medRxiv
Top 1%
0.5%
Show abstract

Post-surgical adverse outcomes, including mortality, intensive care readmission, and complications, remain major challenges for clinical decision-making. Existing machine learning approaches focus on outcome prediction while operating as opaque systems, limiting clinical trust and the translation of predictions into treatment decisions, and many clinical studies rely on synthetic data in which shared intermediate variables create circular dependencies between inputs and targets that compromise reported performance. We aimed to develop an explainable multimodal architecture and a rigorous evaluation methodology that address these gaps. We designed a two-stage architecture integrating supervised deep learning for risk prediction with conservative Q-learning for action recommendation. The first stage uses five modality-specific encoders for structured records, physiological time-series, chest radiographs, clinical notes, and surgical metadata, unified through cross-modal attention into a shared patient-state representation. The second stage applies offline reinforcement learning to recommend clinical actions while preventing value overestimation. We formally characterized a target-leakage flaw in synthetic pipelines and propose a real-data methodology using a verified clinical database, with event-censored temporal separation and uncertainty-weighted per-task training. Component-level behavior was validated on a controlled synthetic benchmark, demonstrating that the architecture functions as designed without claiming clinical validity. The cross-modal attention and risk-prediction components behaved as expected, whereas the offline reinforcement learning stage did not converge on the benchmark, indicating that value estimation requires further investigation on real clinical data. The architecture provides dual-level explainability through attention visualization and value decomposition, contributing a deployable design, a formal methodological critique of synthetic-data practices, and a complete framework for clinically valid evaluation.

11
Identifying Clinical Diagnostic Trajectories Associated With Suicide Death Using Temporal Sequence Mining of Linked Claims and Mortality Data

Belouali, A.; Kitchen, C.; Haroz, E.; Lehmann, H.; Nestadt, P. S.; Wilcox, H. C.; Kharrazi, H.

2026-06-10 health informatics 10.64898/2026.06.08.26355231 medRxiv
Top 1%
0.5%
Show abstract

Background: Most approaches to suicide risk assessment consider clinical conditions as independent risk factors, potentially overlooking prognostic information in the order in which conditions accumulate. We applied temporal sequence mining to linked claims and mortality data to identify ordered clinical diagnostic trajectories associated with suicide death. Results: The cohort included 3 647 059 insured Maryland residents aged 10 years or older with available claims records in the Maryland Suicide Data Warehouse from January 1, 2016, to December 31, 2020, among whom 768 suicide deaths were ascertained through medical examiner linkage. Sequential pattern mining of ICD-10-CM diagnoses grouped into Clinical Classifications Software Refined categories identified 89 221 candidate sequences, of which 1 816 remained significantly associated with suicide death in time-varying Cox models. Adjusted hazard ratios (AHRs) ranged from 2.4 to 134.1. Two-thirds of significant trajectories ended in physical conditions, and approximately half crossed from psychiatric to physical endpoints. Among suicide decedents, 62% were exposed to at least 1 significant sequence (median, 16 per case); median sequence duration was 18.7 months, and median time from completion to death was 13.1 months. In landmark analyses, among patients with depression who later developed suicidal ideation (n = 26 356), the path through anxiety, then anemia, was associated with higher risk (AHR, 4.6; 95% CI, 2.2-9.5), whereas the anxiety-only path was not (AHR, 1.3; 95% CI, 0.8-2.1). Among patients with anxiety who later developed hypertension (n = 149 215), the path through history of self-harm was associated with higher risk (AHR, 32.0; 95% CI, 16.6-61.6). Associations were generally consistent across sex and age. Conclusions: Temporal ordering of clinical conditions may carry prognostic information for suicide death. Clinical trajectories incorporating physical illness within psychiatric sequences identified higher-risk groups. These findings suggest that opportunities for risk detection may extend beyond psychiatric settings and that suicide risk signals may be fragmented across care settings and not apparent within isolated encounters.

12
Development of iADJUST: a theory-informed, patient co-designed digital psychological intervention for adjustment in chronic kidney disease

Schmill, P.; Hudson, J.; Greenwood, S.; Chilcot, J.

2026-06-11 psychiatry and clinical psychology 10.64898/2026.06.10.26355356 medRxiv
Top 1%
0.4%
Show abstract

Background: Psychological distress is common in chronic kidney disease (CKD) and is associated with reduced quality of life, treatment non-adherence, and worse clinical outcomes. Distress in CKD is also linked to difficulties adjusting to the demands of illness management. Despite this, psychological support remains inconsistently integrated within kidney care pathways, and existing interventions often lack clear theoretical specification and explicit targeting of mechanisms underpinning adjustment to CKD. Objectives: To describe the systematic development of iADJUST, a theory-informed patient co-designed digital psychological intervention targeting key cognitive and behavioural mechanisms involved in adjustment to CKD. Methods: Intervention development was guided by the Medical Research Council framework for complex interventions. A structured, iterative process integrated empirical evidence, psychological theory, and patient and public involvement and engagement. The Common-Sense Model of Self-Regulation and cognitive behavioural theories informed the identification of modifiable maintaining mechanisms associated with adjustment to CKD. Intervention components were mapped onto these mechanisms and refined through co-design with people living with CKD. Results: iADJUST is a six-session self-guided digital psychological intervention delivered over 12 weeks and supplemented by therapist contact. The intervention targets illness-related uncertainty, fatigue-related activity dysregulation, catastrophic what-if thinking, self-critical evaluation, and behavioural withdrawal. It integrates psychoeducation, cognitive and behavioural strategies, maintenance planning, and elements from acceptance and commitment therapy and compassion-focused approaches. Content is delivered through video, audio, and guided tasks and activities. Conclusion: iADJUST provides a theory-informed, evidence-based psychological intervention for CKD explicitly mapping intervention components to maintaining cognitive and behavioural mechanisms implicated in adjustment. Feasibility evaluation is underway.

13
An AI-assisted feasibility evaluation of three photoplethysmography-derived microvascular reactivity signals in MIMIC-IV-WDB v0.1.0

Landry, T. C.; Kim, Y.

2026-06-06 health informatics 10.64898/2026.06.03.26354863 medRxiv
Top 1%
0.4%
Show abstract

Background. Capillary refill time, an examiner-dependent bedside test of distal microvascular perfusion, has become a resuscitation target in septic shock,1,2,3,4 motivating a continuous surrogate computed from the photoplethysmogram (PPG, the optical waveform the pulse oximeter on every ICU patient already records).5,6,7,8 Objective. We attempted three PPG-derived candidate measures on the MIMIC-IV Waveform Database (MIMIC-IV-WDB v0.1.0) and asked, by inspecting randomly drawn examples, whether each captured its intended physiology before any downstream modeling. Methods. MIMIC-IV-WDB v0.1.09 was linked to MIMIC-IV.10 The signals were a cuff-anchored perfusion-index recovery (reactive hyperemia when the cuff shares an arm with the probe), a slow Mayer-wave-band power ratio of the perfusion index (sympathetic vasomotor tone), and a per-beat diastolic exponential decay time constant (a refill-like recovery time). For each signal we drew 10 random examples at a fixed seed and checked them against a checklist fixed in advance. Each was read by the author and, separately, by MedGemma 1.5, a multimodal medical language model run locally. A synthetic test with a known time constant checked the third signal. Results. The cuff-anchored signal showed the expected occlusion-reperfusion shape on 268 of 6,236 evaluable cuff cycles (4.30%) in 15 of 19 patients, consistent with opposite-limb placement of the probe and cuff. The slow-band ratio returned a stable cohort value, but a clear, stationary peak appeared in only4 of 10 random windows. The per-beat fit met its goodness-of-fit threshold in 10 of 10 beats, yet a cardiac-frequency heuristic flagged a possible fit on the heart-rate oscillation in 7 of 10, and in 5 of 17 patients the time constant lay where an exponential is indistinguishable from a straight line. A 0.5Hz high-pass pre-filter implanted its own approximately 318 ms time constant regardless of truth. The language model tracked the human on clear positives but reported the pattern present on every call it returned, never absent. Conclusions. Two of the three candidate signals did not reflect their intended physiology in most examples, and the third was constrained by sensor placement. Inspecting a few random raw inputs against a checklist written in advance is an inexpensive upstream check before downstream inference on PPG-derived microvascular signals.

14
Assessment of the accuracy of lung lesions diagnosis in adolescents with osteosarcoma using artificial intelligence

Uskova, N. G.; Gombolevskiy, V. A.; Chernina, V. Y.; Burenchev, D. V.; Akhaladze, D. G.; Panina, E. V.; Karachunskiy, A. I.; Tereschenko, G. V.; Goncharov, M. Y.; Soboleva, E. A.; Konopleva, E. I.; Bydanov, O. I.; Plekhov, S. Y.; Grachev, N. S.

2026-06-10 radiology and imaging 10.64898/2026.06.08.26354011 medRxiv
Top 1%
0.4%
Show abstract

Background. Lung metastases in osteosarcoma (OS) are the main cause of the death. The accuracy of the diagnosis of nodules by computed tomography (CT) of the lungs is critically important for determining the disseminated stage of the disease and planning surgical treatment. The use of artificial intelligence (AI) in the search for lung nodules increases the accuracy of diagnosis and reduces the chance of missing metastases. Objective: to evaluate the accuracy of lung nodules diagnosis in adolescents with OS using AI. Methods. A retrospective assessment of CT scans of adolescents with OS was performed. A pathological nodule with an average size of [&ge;]4 mm was considered a target finding. The diagnostic accuracy of an AI algorithm previously trained on an adult dataset was evaluated, and the number of false positives (FP) and false negatives (FN) was determined. Sensitivity, specificity, accuracy, area under the ROC curve (AUC), positive predictive value, negative predictive value, and F1-measure were calculated. Based on the obtained results, the effectiveness of the algorithm was assessed. Results. 248 CT scans of adolescents with OS were evaluated. The following results were obtained: in 5 cases, the AI algorithm showed a FP result (2.02%), in 34 cases, it showed a FN result (13.71%), and in 209 cases, a correct result (both true positive and true negative) (84.27%). The diagnostic accuracy of the algorithm was 0.843 (95% CI 0.794-0.887). The application of the AI algorithm in the practice of an X-ray doctor in a specific clinical task would allow to increase the sensitivity from 0.805 to 0.891, while ensuring an absolute decrease in the number of FN results by 8.59% and a relative decrease by 44%. Conclusion. The obtained results confirm the practical value of the application of the AI algorithm and justify the implementation of AI-assisted systems in the diagnostic protocols for lung metastases in adolescents with OS.

15
A Heterogeneous Graph Neural Network Framework for Multi-Horizon Stroke Mortality Prediction

Tharzeen, A.; Vafaei Sadr, A.; Radfar, N.; Hwang, W.; Abedi, V.; Zand, R.

2026-06-10 health informatics 10.64898/2026.06.09.26355176 medRxiv
Top 1%
0.3%
Show abstract

Background: Machine learning models for stroke mortality prediction typically treat each time horizon independently and use flat tabular features that ignore the relational structure of electronic health records (EHRs). In this pilot study, we leveraged graph-based machine learning models to predict post stroke all-cause-mortality across three different time horizons. Methods: We developed Stroke Temporal Heterogeneous Graph (StrokeTHG), a heterogeneous graph neural network model for simultaneous multi-horizon stroke mortality prediction (30-day, 90-day, 1-year) using EHR data from Penn State Health System. The model encodes various relations among EHR entities (e.g., patient, diagnosis, comorbidity) and temporal encoding of admission time to better predict stroke mortality. We compared our proposed approach against various baseline methods, including Logistic Regression, Random Forest, and XGBoost. We also performed ablation and subgroup analyses, evaluated the quality of learned graph embeddings, and assessed the importance of different edge types in the graph. Results: We included 4,144 stroke patients (mean age 69.2 years; 54.3% men), of whom 3,332 (80.4%) survived their stroke after one year. 30-day, 90-day, and 1-year mortality rates were 9.7%, 13.7%, and 19.6%, respectively. Our proposed approach, StrokeTHG, achieved AUROC of 0.872, 0.878, and 0.837 across horizons, outperforming all tabular baselines. At [&ge;] , 75% specificity, the model identified 5-10 percentage points more mortality cases than the best baseline at each horizon. Subgroup analysis demonstrated consistent performance across sex subgroups and the largest discriminative gains in the Age 65-80 stratum. Edge-type ablation identified phenotype-patient and admission-patient edges in the constructed EHR graph as the most influential relational edges for mortality prediction. StrokeTHG embeddings outperformed all graph and matrix factorization baselines under an identical downstream classifier, confirming that performance gains stem from representation quality rather than classifier capacity. Conclusions: StrokeTHG demonstrates that heterogeneous graph representations of EHR data provide a consistent improvement over flat tabular models for multi-horizon stroke mortality prediction, with particular advantage at clinically actionable sensitivity thresholds and novel multi-horizon monotonic prediction capability. This methodological framework may be adaptable to other EHR-based clinical research studies seeking to leverage heterogeneous relational structures for predictive modeling.

16
Dementia and Frailty Impact Postoperative Care Trajectories and Burden among Older Adults Undergoing Radical Cystectomy for Bladder Cancer

Ernandez, J.; Xiang, L.; Adler, R.; Hsu, J.; Shah, S. K.; Kim, D.; Gershman, B.; Mossanen, M.; Weissman, J. S.

2026-06-06 urology 10.64898/2026.06.04.26354768 medRxiv
Top 2%
0.3%
Show abstract

OBJECTIVE: Bladder cancer (BC) is predominantly a disease of older, comorbid adults, and radical cystectomy (RC), which is the gold standard treatment, carries considerable morbidity. We sought to determine the impact of baseline dementia and frailty on the care trajectory beyond the immediate postoperative period. We hypothesized that frail patients and those with dementia undergoing RC for BC will have poorer care trajectories. METHODS AND MATERIALS: We identified Medicare beneficiaries [&ge;] 66 years old who underwent RC for BC in 2017 with 12 months of pre- and post-RC enrollment. Frailty and dementia were characterized using validated, claims-based measures. Associations between baseline frailty and dementia with postoperative care trajectory outcomes were determined using Fine-Gray competing risk models. RESULTS: We identified 3,600 beneficiaries of whom 11.6% were frail and 3.4% met criteria for dementia. Patients with dementia were more likely to be frail, comorbid, and not receive standard-of-care neoadjuvant chemotherapy. Frailty was independently associated with [&ge;] 2 transitions in care level after index discharge from RC and skilled nursing facility (SNF) admissions within 1 year of RC, exposure to intensive post-RC interventions, including dialysis and feeding tube placement, and poorer survival. Dementia remained associated with SNF admissions regardless of frailty level. CONCLUSIONS: Among a contemporary cohort of older adults undergoing RC for BC, preoperative dementia and frailty were independently associated with poorer care trajectory beyond the immediate postoperative period after RC. Our work highlights a role for preoperative geriatric assessment in identifying and optimizing patients at greatest risk.

17
Adapting a Regulation of Craving Magnetic Resonance Imaging Task to Generate Functional Repetitive Transcranial Magnetic Stimulation Targets for the Ventromedial and Dorsolateral Prefrontal Cortex in Treatment-Seeking Participants with Cannabis Use Disorder

Geoly, A.; McCalley, D. M.; Struckmann, W.; Azeez, A.; Wong, B.; Kim, B.; Ninomiya, S.; Ahmed, S.; Kim, J. P.; McRae-Clark, A. L.; Froeliger, B.; Sahlem, G. L.

2026-06-06 addiction medicine 10.64898/2026.06.04.26353616 medRxiv
Top 2%
0.3%
Show abstract

Background: Repetitive Transcranial Magnetic Stimulation (rTMS) is a promising treatment across addictive disorders including Cannabis Use Disorder (CUD). Targeting incentive-salience circuitry via the ventromedial prefrontal cortex (vmPFC) and central-executive circuitry via the left dorsolateral prefrontal cortex (LDLPFC) are both promising treatment approaches; however, to date structural targets have predominated whereas functional targeting may allow for more precision. In this pilot trial we adapted a functional Magnetic Resonance Imaging (fMRI) Regulation of Craving (ROC) task to generate fMRI-based rTMS targets in the vmPFC and LDLPFC. Methods: We recruited treatment-seeking participants with moderate or severe CUD as a part of an open-label trial and administered an adapted ROC-task during fMRI following 24-hours of cannabis abstinence. We identified sub-portions of maximal activation of the LDLPFC when participants thought of long-term consequences of cannabis use (Later) and of the vmPFC when participants thought of short-term positive aspects of cannabis use (Now). We hypothesized that our task would generate acceptable rTMS targets in >66% of baseline fMRI scans. Results: A total of 20-participants enrolled in the trial (50%F, age=33.3+9.8) and completed the baseline fMRI. The adapted ROC-task elicited group level activation in the LDLPFC and precuneus in the Later>Now and in the bilateral vmPFC, ACC, and striatum in the Now>Later contrast. Acceptable functional targets resolved in both the vmPFC and LDLPFC in 19 of 20 participants (one participant did not tolerate MRI). Conclusions: The adapted ROC-task elicits activation in incentive salience and central executive circuitry and can feasibly generate rTMS targets when using a cluster selection algorithm.

18
Behavioral and Functional Neuroimaging Effects of Delivering a Course of Repetitive Transcranial Magnetic Stimulation to Personalized Targets Within the Ventrolateral Or Dorsolateral Prefrontal Cortex in Treatment-Seeking Participants with Cannabis Use Disorder

McCalley, D.; Wong, B.; Geoly, A.; Struckman, W.; Azeez, A.; Kaloiani, I.; Kim, B.; Ninomiya, S.; Ehrie, J.; Austelle, C. W.; Rolle, C. E.; Kim, J. P.; Froeliger, B.; McRae-Clark, A. L.; Sahlem, G.

2026-06-10 addiction medicine 10.64898/2026.06.08.26355193 medRxiv
Top 2%
0.3%
Show abstract

Background: Repetitive Transcranial Magnetic Stimulation (rTMS) is a promising treatment across addictive disorders including Cannabis Use Disorder (CUD). Stimulation of two rTMS-targets, the ventromedial prefrontal cortex (vmPFC) and the left dorsolateral prefrontal cortex (LDLPFC), limbic and executive control network hubs respectively, may yield differential effects. In this pilot trial, we explored the differential effects of 36-sessions of rTMS applied to either the vmPFC or LDLPFC. Methods: Treatment-seeking participants with moderate or severe CUD (n=20, 10F, age=33.3+9.8SD) were randomized to 36-sessions of open-label rTMS (two sessions-per-visit, two or three visits-per-week) to either the LDLPFC (3000-pulses; 10Hz) or vmPFC (900-pulses; 1Hz) using personalized functional Magnetic Resonance Imaging (fMRI) targets along with three-sessions of Motivational Enhancement Therapy. At baseline and following rTMS, the Time-Line Follow-Back was used to measure Days-per-week of cannabis use and the fMRI Regulation of Craving (ROC) task was used to measure network activation to cues associated with long-term negative ('Later') and short-term positive ('Now') consequences of cannabis use. Results: Eighty percent of participants completed study-rTMS. There was a significant decrease in days-per-week of cannabis use in both groups (vmPFC: d=7.9; DLPFC, d=3.1) between the four-weeks of baseline and seven-weeks of follow-up. LDPFC-rTMS reduced fMRI BOLD signal magnitude and increased LDLPFC functional connectivity in response to cues, while vmPFC-TMS reduced functional connectivity. Conclusions: Treatment-seeking participants with CUD reduced the number of days-per-week they used cannabis when receiving rTMS applied to either the LDPFC or vmPFC, while fMRI effects differed by treatment target. Future larger sham-controlled trials are needed for efficacy and biomarker determination.

19
Beyond event-rate enrichment: proteomic risk scores for mechanism-aware prevention trial design

Fieggen, J.; Simond, G.; Segal, B. M.; Noori, A.; Thakurta, A.; Butler, C. C.; Clifton, D. A.; Clifton, L.

2026-06-10 health informatics 10.64898/2026.06.09.26355266 medRxiv
Top 2%
0.3%
Show abstract

Background. Blood-based biomarkers are increasingly proposed for identifying high-risk individuals before clinical disease and for making prevention-oriented trials more efficient. Prognostic enrichment can increase event rates, but trial efficiency also depends on whether the intervention effect is preserved in the enriched population. Methods. Using the UK Biobank Pharma Proteomics Project, we trained disease-specific proteomic risk scores (ProRS) from 2,916 plasma proteins with elastic-net Cox models. We compared ProRS, polygenic risk scores (PRS), and combined PRS--ProRS scores across ten incident diseases. We estimated cumulative incidence and theoretical two-arm time-to-event trial sample sizes across risk strata. To evaluate effect preservation, we examined six intervention-analogue exposure--outcome pairs spanning genetic (PCSK9/coronary artery disease, APOE/Alzheimer's disease, PPARG/type 2 diabetes, IL23R/Crohn's disease), behavioural (physical activity/all-cause mortality), and pharmacological (RAAS inhibitors versus calcium channel blockers/coronary artery disease) examples. Results. ProRS outperformed PRS for 9 of 10 diseases (median C-index 0.75 versus 0.61). ProRS and PRS were weakly correlated (median Pearson |r| = 0.04), and joint PRS--ProRS stratification identified groups with higher observed incidence than either score alone for several endpoints. In the top risk quartile, combined-score enrichment reduced theoretical required sample sizes by 32--74\% under a fixed 20\% relative hazard reduction. These gains were not always preserved when stratum-specific intervention-analogue effects were used. Effects were broadly preserved for APOE/Alzheimer's disease and physical activity/mortality. The PPARG/type 2 diabetes effect attenuated toward the null under all three score types, showing that event-rate enrichment does not guarantee effect preservation. For IL23R/Crohn's disease and the antihypertensive comparison, point estimates differed across score types -- preserved under polygenic but attenuated under proteomic enrichment -- but confidence intervals were wide and overlapping. Conclusions. Proteomic risk scores can identify high-event-rate populations for prevention-oriented trials, but event-rate enrichment alone is insufficient for trial design. Biomarker-guided enrichment should evaluate mechanism-specific effect preservation and may be preferable as a stratification or adaptive-design variable rather than as a restrictive eligibility criterion.

20
Multimodal neuroimaging approach for cognitive impairment in Alzheimer disease

Gonzales, M.; Kang, X.; Adamson, M. M.; Chao, S. Z.; Yoon, B. C.

2026-06-06 radiology and imaging 10.64898/2026.06.04.26354924 medRxiv
Top 2%
0.3%
Show abstract

PURPOSE: Alzheimer disease (AD) is associated with cognitive impairment, brain atrophy, and elevated amyloid-beta and tau. The study aimed to characterize regional atrophy associated with elevated amyloid-beta and tau, as measured by [18F]florbetapir (FBP) and [18F]flortaucipir (FTP) positron emission tomography (PET), respectively, and determine whether combining PET and atrophy data improves the prediction of cognitive impairment. METHODS: Alzheimer Disease Neuroimaging Initiative data (n = 381) were retrospectively analyzed. PET results were correlated with cortical thickness, gray matter (GM) volumes, Mini-Mental State Examination, and Montreal Cognitive Assessment. Linear/logistic regression and area under the curve (AUC) were used to evaluate for significant correlations and compare performances in distinguishing cognitive impairment, respectively. RESULTS: Incremental loss of cortical thickness and GM volume was observed from FBP-/FTP- (n = 205) to single PET-positive (FBP+/FTP-, n = 133; FBP-/FTP+, n = 5) and FBP+/FTP+ (n = 38) groups, particularly in the temporal and parietal lobes. FBP+/FTP+ showed the most severe cortical thickness loss in the entorhinal cortex, temporal lobe GM atrophy, and cognitive impairment. Adding brain atrophy as the third variable resulted in higher odds ratios and improved AUCs for cognitive impairment, with FBP+/FTP+/temporal GM or entorhinal cortical atrophy+ demonstrating the strongest associations with cognitive impairment. CONCLUSION: A multimodal approach combining PET and MRI may help improve the assessment of cognitive impairment in AD.